Discovering Lexical Generalisations. A Supervised Machine Learning Approach to Inheritance Hierarchy Construction
نویسنده
چکیده
Grammar development over the last decades has seen a shift away from large inventories of grammar rules to richer lexical structures. Many modern grammar theories are highly lexicalised. But simply listing lexical entries typically results in an undesirable amount of redundancy. Lexical inheritance hierarchies, on the other hand, make it possible to capture linguistic generalisations and thereby reduce redundancy. Inheritance hierarchies are usually constructed by hand but this is time-consuming and often impractical if a lexicon is very large. Constructing hierarchies automatically or semiautomatically facilitates a more systematic analysis of the lexical data. In addition, lexical data is often extracted automatically from corpora and this is likely to increase over the coming years. Therefore it makes sense to go a step further and automate the hierarchical organisation of lexical data too. Previous approaches to automatic lexical inheritance hierarchy construction tended to focus on minimality criteria, aiming for hierarchies that minimised one or more criteria such as the number of path-value pairs, the number of nodes or the number of inheritance links (Petersen 2001, Barg 1996a, and in a slightly different context: Light 1994). Aiming for minimality is motivated by the fact that the conciseness of inheritance hierarchies is a main reason for their use. However, I will argue that there are several problems with minimality-based approaches. First, minimality is not well defined in the context of lexical inheritance hierarchies as there is a tension between different minimality criteria. Second, minimality-based approaches tend to underestimate the importance of linguistic plausibility. While such approaches start with a definition of minimal redundancy and then try to prove that this leads to plausible hierarchies, the approach suggested here takes the opposite direction. It starts with a manually built hierarchy to which a supervised machine learning algorithm is applied with the aim of finding a set of formal criteria that can guide the construction of plausible hierarchies. Taking this direction means that it is more likely that the selected criteria do in fact lead to plausible hierarchies. Using a machine learning technique also has the advantage that the set of criteria can be much larger than in hand-crafted definitions. Consequently, one can define conciseness in very broad terms, taking into account interdependencies in the data as well as simple minimality criteria. This leads to a more fine-grained model of hierarchy quality. In practice, the method proposed here consists of two components: Galois lattices are used to define the search space as the set of all generalisations over the input lexicon. Maximum entropy models which have been trained on a manually built hierarchy are then applied to the
منابع مشابه
Emotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملA Galois Lattice based Approach to Lexical Inheritance Hierarchy Learning
Lexical inheritance hierarchies are used widely as a means of representing lexical information efficiently but there have been few attempts to construct them automatically. This paper presents a two-step construction algorithm in which a Galois lattice is built and then pruned into an inheritance hierarchy. The pruning step utilises a maximum entropy model. This is compared to a pruning method ...
متن کاملITRI - 00 - 36 Phonological feature based multilingual lexical description Carole
This paper presents a framework for compactly describing word forms in terms of phonological features. Using a highly modular default-inheritance based approach, the framework supports the description of lexical generalisations traditionally modelled as morphology and phonology in a single phonology-based representation. This representation is more uniform and more detailed than previous approa...
متن کاملParaphrase Identification on the Basis of Supervised Machine Learning Techniques
This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexica...
متن کاملAn Approach to Building the Hierarchical Element of a Lexical Knowledge Base from a Machine Readable Dictionary. an Approach to Building the Hierarchical Element of a Lexical Knowledge Base from a Machine Readable Dictionary 1
This abstract describes an approach to extracting taxonomies from machine readable dictionaries and using them to structure a lexical knowledge base which incorporates default inheritance. Taxonomy construction is based on an intuitive notion of the organisation of the substantial quantities of data in machine readable dictionaries which were developed for quite independent purposes. Our intent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004